*Decreased Main Column Overhead Condensor Efficiency*¶

Introduction¶

The FCC (Fluidized Catalytic Cracking) condenser is a critical component in refinery operations. It process sweet Vacuum Gas Oil as a feed and cracks heavy oil into lighter oils like LPG and Gasoline.

Any reduction in efficiency can indicate operational issues such as fouling, leaks, or mechanical failure. This project applies Machine Learning (ML) techniques to detect anomalies in FCC condenser efficiency.

image.png

Data Preprocessing¶

The dataset contains operational parameters, including temperature, pressure, flow rates, and velocities. The following preprocessing steps were performed:

  1. Handling Missing Values: Missing values were filled using mean imputation.
  2. Feature Scaling: Standardized using StandardScaler.

Exploratory Data Analysis (EDA)¶

EDA was conducted to visualize trends and detect potential anomalies. Key findings:

  1. Line plot revealed variations in parameters like Temperature, Pressure, Flowrate, etc.
  2. PCA highlighted underlying structure in the dataset.

Anomaly Detection Techniques¶

1. Principal Component Analysis (PCA)

PCA reduced high-dimensional data to two principal components.
Anomalies were detected by analyzing data points far from the normal cluster.

2. Autoencoder (Deep Learning)

A neural network trained to reconstruct normal data patterns.
Reconstruction error was used to flag high-error data points as anomalies.

Data Cleaning, Feature Engineering, Predictive Modeling¶

Importing libraries and datasets

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
plt.rcParams['figure.figsize'] = [12, 6]
plt.rcParams.update({'font.size': 12})
In [2]:
df = pd.read_excel('columns.xlsx')
col = df['Symbol'].values
col = col.tolist()
In [3]:
df_stableFeedFlow = pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/NOC_stableFeedFlow_outputs.csv',header=None)
df_stableFeedFlow.set_index = df_stableFeedFlow.iloc[:,0]
df_stableFeedFlow = df_stableFeedFlow.drop(columns=0)
df_stableFeedFlow.columns= col
df_stableFeedFlow.sample(5)
Out[3]:
F3 Tatm T1 P4 deltaP P6 Fair T3 T2 Tr ... FLCO FSlurry FReflux Tfra T10 T20 V9 V8 V10 V11
632 164.85 79.961 460.98 34.4 -6.4 28 2.6806 1562.7 616.00 969.05 ... 1644.7 212.78 2993.6 314.71 509.76 628.02 45.995 49.644 49.402 47.286
1759 165.28 78.490 460.25 34.4 -6.4 28 2.6872 1569.8 616.01 969.04 ... 1647.0 209.61 3017.6 314.15 510.13 628.50 46.808 50.141 49.038 47.382
849 165.09 79.857 460.68 34.4 -6.4 28 2.6847 1566.0 616.02 969.06 ... 1642.0 210.79 3006.4 314.88 509.92 628.18 45.873 49.829 48.649 47.118
2656 165.14 77.675 461.05 34.4 -6.4 28 2.6848 1564.0 615.99 969.04 ... 1646.1 209.44 2987.6 313.13 509.96 628.23 48.028 49.813 48.804 47.349
1809 165.01 78.825 460.59 34.4 -6.4 28 2.6801 1566.1 616.01 968.99 ... 1641.7 218.10 3003.6 314.25 509.93 628.41 46.728 49.886 49.767 47.099

5 rows × 46 columns

In [4]:
df_varyingFeedFlow=pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/NOC_varyingFeedFlow_outputs.csv',header=None)
df_varyingFeedFlow.set_index = df_varyingFeedFlow.iloc[:,0]
df_varyingFeedFlow = df_varyingFeedFlow.drop(columns=0)
df_varyingFeedFlow.columns= col
df_varyingFeedFlow.sample(5)
Out[4]:
F3 Tatm T1 P4 deltaP P6 Fair T3 T2 Tr ... FLCO FSlurry FReflux Tfra T10 T20 V9 V8 V10 V11
3259 164.68 78.932 461.14 34.4 -6.4 28 2.6729 1560.6 616.00 968.96 ... 1635.6 215.15 2915.0 312.66 509.18 627.24 48.220 48.679 49.227 46.820
3602 163.01 79.983 461.18 34.4 -6.4 28 2.6486 1550.1 616.00 969.00 ... 1610.3 227.94 2732.3 310.08 507.40 624.66 50.636 46.035 48.098 45.560
361 166.80 78.746 461.15 34.4 -6.4 28 2.7136 1573.7 616.01 969.08 ... 1675.9 197.06 3256.9 318.27 512.04 631.08 42.794 53.392 50.467 48.867
6655 164.37 79.734 461.51 34.4 -6.4 28 2.6710 1556.4 616.00 969.02 ... 1629.5 211.65 2875.8 312.44 508.82 626.51 48.268 48.059 48.031 46.516
6216 162.63 79.331 460.62 34.4 -6.4 28 2.6415 1551.1 615.99 968.99 ... 1603.4 226.02 2637.2 307.75 506.56 623.31 53.173 44.809 47.564 45.250

5 rows × 46 columns

In [5]:
df_condEff_decrease = pd.read_csv(r'https://raw.githubusercontent.com/AshuPraja13/FCC-abnormality-detection/main/condEff_decrease_outputs.csv',header=None)
df_condEff_decrease.set_index = df_condEff_decrease.iloc[:,0]
df_condEff_decrease = df_condEff_decrease.drop(columns=0)
df_condEff_decrease.columns= col
df_condEff_decrease.sample(5)
Out[5]:
F3 Tatm T1 P4 deltaP P6 Fair T3 T2 Tr ... FLCO FSlurry FReflux Tfra T10 T20 V9 V8 V10 V11
545 164.88 79.732 460.44 34.4 -6.4 28 2.6792 1566.1 616.00 969.01 ... 1640.9 216.86 3044.1 317.97 509.83 628.21 42.217 49.937 49.582 47.062
1427 164.99 75.169 460.74 34.4 -6.4 28 2.6798 1565.0 616.00 969.00 ... 1638.4 214.28 3025.6 317.74 509.68 627.93 42.521 49.668 49.428 46.934
1114 165.01 78.505 461.15 34.4 -6.4 28 2.6823 1562.6 615.99 969.02 ... 1650.1 211.67 3094.6 320.84 509.93 628.25 39.116 50.300 50.083 47.580
340 164.59 78.586 460.67 34.4 -6.4 28 2.6733 1562.9 615.99 968.98 ... 1640.6 223.55 2993.8 315.26 509.62 628.10 45.376 49.551 50.644 47.058
1051 165.03 78.948 460.82 34.4 -6.4 28 2.6820 1564.8 616.00 969.02 ... 1641.1 212.26 3075.8 320.81 509.77 627.99 39.079 50.000 49.319 47.085

5 rows × 46 columns

EDA

In [6]:
df_condEff_decrease.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1440 entries, 0 to 1439
Data columns (total 46 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   F3           1440 non-null   float64
 1   Tatm         1440 non-null   float64
 2   T1           1440 non-null   float64
 3   P4           1440 non-null   float64
 4   deltaP       1440 non-null   float64
 5   P6           1440 non-null   int64  
 6   Fair         1440 non-null   float64
 7   T3           1440 non-null   float64
 8   T2           1440 non-null   float64
 9   Tr           1440 non-null   float64
 10  Treg         1440 non-null   float64
 11  Lsp          1440 non-null   float64
 12  Tcyc         1440 non-null   float64
 13  Tcyc - Treg  1440 non-null   float64
 14  Cco,sg       1440 non-null   int64  
 15  Co2,sg       1440 non-null   float64
 16  P5           1440 non-null   float64
 17  V4           1440 non-null   float64
 18  V6           1440 non-null   float64
 19  V7           1440 non-null   float64
 20  V3           1440 non-null   float64
 21  V1           1440 non-null   float64
 22  V2           1440 non-null   float64
 23  Frgc         1440 non-null   int64  
 24  Fsc          1440 non-null   int64  
 25  ACAB         1440 non-null   float64
 26  AWGC         1440 non-null   float64
 27  F5           1440 non-null   float64
 28  F7           1440 non-null   float64
 29  Fsg          1440 non-null   float64
 30  FV11         1440 non-null   int64  
 31  P1           1440 non-null   float64
 32  P2           1440 non-null   float64
 33  FLPG         1440 non-null   float64
 34  FLN          1440 non-null   float64
 35  FHN          1440 non-null   float64
 36  FLCO         1440 non-null   float64
 37  FSlurry      1440 non-null   float64
 38  FReflux      1440 non-null   float64
 39  Tfra         1440 non-null   float64
 40  T10          1440 non-null   float64
 41  T20          1440 non-null   float64
 42  V9           1440 non-null   float64
 43  V8           1440 non-null   float64
 44  V10          1440 non-null   float64
 45  V11          1440 non-null   float64
dtypes: float64(41), int64(5)
memory usage: 517.6 KB
In [7]:
df_condEff_decrease.describe().T
Out[7]:
count mean std min 25% 50% 75% max
F3 1440.0 164.964931 1.705031e-01 164.480000 164.840000 164.970000 165.080000 165.520000
Tatm 1440.0 78.342074 1.495206e+00 75.014000 77.206250 78.782000 79.699250 80.057000
T1 1440.0 460.921306 3.773500e-01 459.880000 460.660000 460.910000 461.190000 461.930000
P4 1440.0 34.400000 8.884870e-13 34.400000 34.400000 34.400000 34.400000 34.400000
deltaP 1440.0 -6.400001 1.366802e-05 -6.400300 -6.400000 -6.400000 -6.400000 -6.400000
P6 1440.0 28.000000 0.000000e+00 28.000000 28.000000 28.000000 28.000000 28.000000
Fair 1440.0 2.679885 3.071531e-03 2.671800 2.677700 2.680100 2.682000 2.689500
T3 1440.0 1563.718958 2.493819e+00 1557.700000 1561.900000 1563.500000 1565.200000 1571.900000
T2 1440.0 615.999910 6.663159e-03 615.980000 616.000000 616.000000 616.000000 616.020000
Tr 1440.0 968.999750 3.104291e-02 968.910000 968.980000 969.000000 969.020000 969.110000
Treg 1440.0 1250.001528 5.095774e-02 1249.900000 1250.000000 1250.000000 1250.000000 1250.100000
Lsp 1440.0 29.653098 9.399890e-02 29.376000 29.586000 29.655000 29.712000 29.905000
Tcyc 1440.0 1255.278958 5.024220e-02 1255.200000 1255.200000 1255.300000 1255.300000 1255.400000
Tcyc - Treg 1440.0 5.278488 3.785116e-02 5.186900 5.250175 5.280050 5.306625 5.388200
Cco,sg 1440.0 29881.611806 4.865322e+01 29737.000000 29846.000000 29880.000000 29917.250000 30009.000000
Co2,sg 1440.0 0.012470 1.691557e-04 0.012067 0.012344 0.012476 0.012599 0.012967
P5 1440.0 24.900000 6.752501e-13 24.900000 24.900000 24.900000 24.900000 24.900000
V4 1440.0 47.584573 1.025714e+00 45.364000 46.808000 47.828000 48.496000 48.969000
V6 1440.0 24.783957 1.008147e-01 24.532000 24.708000 24.797000 24.866250 25.033000
V7 1440.0 54.577801 6.263411e-02 54.413000 54.534000 54.582000 54.621000 54.773000
V3 1440.0 46.982188 1.595010e-02 46.937000 46.971000 46.982000 46.993000 47.025000
V1 1440.0 57.909178 1.800725e-01 57.470000 57.778500 57.893000 58.018500 58.493000
V2 1440.0 45.315724 5.301285e-02 45.177000 45.283000 45.317000 45.350000 45.481000
Frgc 1440.0 49572.265972 6.081201e+01 49407.000000 49529.000000 49576.000000 49614.000000 49765.000000
Fsc 1440.0 49572.208333 6.126321e+01 49411.000000 49529.000000 49577.000000 49614.000000 49764.000000
ACAB 1440.0 280.687479 1.405453e+00 277.550000 279.710000 281.055000 281.910000 282.860000
AWGC 1440.0 213.537201 6.912833e+00 198.490000 208.370000 215.240000 219.650000 222.610000
F5 1440.0 1989.637292 6.187052e+00 1974.600000 1985.175000 1989.100000 1993.400000 2009.700000
F7 1440.0 3735.766111 5.533026e+00 3722.500000 3731.800000 3735.800000 3739.800000 3752.000000
Fsg 1440.0 160.793049 1.845840e-01 160.310000 160.660000 160.810000 160.920000 161.370000
FV11 1440.0 29078.619444 7.483565e+02 27433.000000 28523.000000 29265.000000 29738.250000 30068.000000
P1 1440.0 14.637990 1.015644e-04 14.637000 14.638000 14.638000 14.638000 14.638000
P2 1440.0 35.044537 2.357305e-02 34.995000 35.026000 35.040000 35.063000 35.101000
FLPG 1440.0 3199.203681 1.250097e+02 2929.900000 3104.475000 3228.800000 3310.200000 3366.300000
FLN 1440.0 3751.406806 1.255117e+02 3582.700000 3640.875000 3721.100000 3844.300000 4029.600000
FHN 1440.0 711.221708 4.431817e+00 698.640000 708.587500 710.950000 713.720000 722.810000
FLCO 1440.0 1642.687292 4.569204e+00 1630.300000 1639.400000 1642.600000 1646.100000 1654.900000
FSlurry 1440.0 214.302257 4.559766e+00 201.700000 210.787500 214.325000 217.742500 225.080000
FReflux 1440.0 3038.856389 4.980395e+01 2923.500000 2999.575000 3050.200000 3077.500000 3127.300000
Tfra 1440.0 317.874403 3.343519e+00 310.700000 315.350000 318.700000 320.820000 322.170000
T10 1440.0 509.777910 1.371906e-01 509.380000 509.680000 509.770000 509.870000 510.130000
T20 1440.0 628.078812 1.482168e-01 627.680000 627.990000 628.080000 628.180000 628.400000
V9 1440.0 42.438248 3.860931e+00 37.629000 39.092250 41.394500 45.168750 51.247000
V8 1440.0 49.872215 3.110262e-01 49.160000 49.666000 49.880500 50.083500 50.627000
V10 1440.0 49.723878 6.232966e-01 47.978000 49.351750 49.678500 50.085000 51.357000
V11 1440.0 47.170649 2.520129e-01 46.496000 46.984000 47.164000 47.354250 47.844000
In [8]:
sns.heatmap(df_condEff_decrease.corr(),cmap='coolwarm')
Out[8]:
<AxesSubplot:>
In [9]:
for n,i in enumerate(df_stableFeedFlow.columns):
    plt.figure(figsize=(12,2))
    plt.plot(df_stableFeedFlow[i])
    plt.xlabel('time (mins)')
    plt.ylabel(i)
    plt.title(df[df['Symbol']==i]['Description'].values)
    plt.show()
In [10]:
for n,i in enumerate(df_varyingFeedFlow.columns):
    plt.figure(figsize=(12,2))
    plt.plot(df_varyingFeedFlow[i])
    plt.xlabel('time (mins)')
    plt.ylabel(i)
    plt.title(df[df['Symbol']==i]['Description'].values)
    plt.show()
In [11]:
for n,i in enumerate(df_condEff_decrease.columns):
    plt.figure(figsize=(12,2))
    plt.plot(df_condEff_decrease[i])
    plt.xlabel('time (mins)')
    plt.ylabel(i)
    plt.title(df[df['Symbol']==i]['Description'].values)
    plt.show()

Scaling the data with mean=0 & std = 1 using Standard Scalar.

In [12]:
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
In [13]:
X = ss.fit_transform(df_stableFeedFlow)

Applying PCA

In [14]:
from sklearn.decomposition import PCA
pca = PCA()
In [15]:
X_pca = pca.fit_transform(X)
In [16]:
plt.figure(figsize=(15,6))
sns.set_style('whitegrid')
sns.lineplot(x=list(range(1,47)), y=np.cumsum(pca.explained_variance_ratio_), drawstyle='steps-pre')
sns.lineplot(x=list(range(0,46)),y=np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Eigen Values')
plt.ylabel('Ratio of Variance')
plt.title('Variance by each Eigen Value')
plt.show()

It can be clearly seen than 10 dimentions can describe more than 98% data, hence redcing the feature space from 46 to 10.

In [17]:
pca = PCA(n_components=10)
X_pca = pca.fit_transform(X)
In [18]:
plt.figure(figsize=(15,6))
sns.set_style('whitegrid')
sns.lineplot(x=list(range(1,11)), y=np.cumsum(pca.explained_variance_ratio_), drawstyle='steps-pre')
sns.lineplot(x=list(range(0,10)),y=np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Eigen Values')
plt.ylabel('Ratio of Variance')
plt.title('Variance by each Eigen Value')
plt.show()

Applying Autoencoders

In [19]:
X_train = X.reshape(2880,46,1)

Lets create a Sequential model with Bidirectional LSTM & train the model when plant is in steady state.
To avoid overfitting of model by using 20% dropout.

In [20]:
# del model
In [21]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(256,return_sequences=True),input_shape=(46,1)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128,return_sequences=True)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 bidirectional (Bidirection  (None, 46, 512)           528384    
 al)                                                             
                                                                 
 dropout (Dropout)           (None, 46, 512)           0         
                                                                 
 bidirectional_1 (Bidirecti  (None, 46, 256)           656384    
 onal)                                                           
                                                                 
 dropout_1 (Dropout)         (None, 46, 256)           0         
                                                                 
 dense (Dense)               (None, 46, 1)             257       
                                                                 
=================================================================
Total params: 1185025 (4.52 MB)
Trainable params: 1185025 (4.52 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [22]:
model.fit(X_train,X_train,epochs=30)
Epoch 1/30
90/90 [==============================] - 66s 406ms/step - loss: 0.1392 - mae: 0.1968
Epoch 2/30
90/90 [==============================] - 37s 407ms/step - loss: 0.0027 - mae: 0.0359
Epoch 3/30
90/90 [==============================] - 37s 407ms/step - loss: 0.0023 - mae: 0.0326
Epoch 4/30
90/90 [==============================] - 37s 414ms/step - loss: 0.0021 - mae: 0.0308
Epoch 5/30
90/90 [==============================] - 37s 410ms/step - loss: 0.0020 - mae: 0.0294
Epoch 6/30
90/90 [==============================] - 37s 409ms/step - loss: 0.0018 - mae: 0.0283
Epoch 7/30
90/90 [==============================] - 37s 413ms/step - loss: 0.0018 - mae: 0.0276
Epoch 8/30
90/90 [==============================] - 37s 413ms/step - loss: 0.0017 - mae: 0.0271
Epoch 9/30
90/90 [==============================] - 37s 415ms/step - loss: 0.0017 - mae: 0.0268
Epoch 10/30
90/90 [==============================] - 37s 414ms/step - loss: 0.0016 - mae: 0.0264
Epoch 11/30
90/90 [==============================] - 37s 414ms/step - loss: 0.0016 - mae: 0.0260
Epoch 12/30
90/90 [==============================] - 37s 416ms/step - loss: 0.0016 - mae: 0.0258
Epoch 13/30
90/90 [==============================] - 37s 414ms/step - loss: 0.0016 - mae: 0.0257
Epoch 14/30
90/90 [==============================] - 38s 418ms/step - loss: 0.0015 - mae: 0.0254
Epoch 15/30
90/90 [==============================] - 37s 413ms/step - loss: 0.0015 - mae: 0.0251
Epoch 16/30
90/90 [==============================] - 37s 412ms/step - loss: 0.0015 - mae: 0.0249
Epoch 17/30
90/90 [==============================] - 37s 413ms/step - loss: 0.0015 - mae: 0.0249
Epoch 18/30
90/90 [==============================] - 38s 420ms/step - loss: 0.0015 - mae: 0.0248
Epoch 19/30
90/90 [==============================] - 37s 415ms/step - loss: 0.0015 - mae: 0.0247
Epoch 20/30
90/90 [==============================] - 38s 419ms/step - loss: 0.0014 - mae: 0.0247
Epoch 21/30
90/90 [==============================] - 38s 420ms/step - loss: 0.0014 - mae: 0.0244
Epoch 22/30
90/90 [==============================] - 38s 425ms/step - loss: 0.0014 - mae: 0.0245
Epoch 23/30
90/90 [==============================] - 38s 418ms/step - loss: 0.0014 - mae: 0.0244
Epoch 24/30
90/90 [==============================] - 37s 411ms/step - loss: 0.0014 - mae: 0.0244
Epoch 25/30
90/90 [==============================] - 38s 428ms/step - loss: 0.0014 - mae: 0.0242
Epoch 26/30
90/90 [==============================] - 39s 438ms/step - loss: 0.0014 - mae: 0.0243
Epoch 27/30
90/90 [==============================] - 23s 253ms/step - loss: 0.0014 - mae: 0.0242
Epoch 28/30
90/90 [==============================] - 24s 266ms/step - loss: 0.0014 - mae: 0.0242
Epoch 29/30
90/90 [==============================] - 22s 247ms/step - loss: 0.0014 - mae: 0.0243
Epoch 30/30
90/90 [==============================] - 22s 249ms/step - loss: 0.0014 - mae: 0.0243
Out[22]:
<keras.src.callbacks.History at 0x1c447022b50>

Calculating the Reconstruction error using MAE.
Considering 99 percentile of error as an acceptable range, and it signifies the steady state operation.

In [23]:
error_ae = []
for i in range(X.shape[0]):
    y_pred = model.predict(X[i].reshape(1,46,1),verbose=None)[0,:,0]
    error_ae.append(np.abs(X[i]-y_pred).sum())
AE_CL = np.percentile(error_ae,99)
In [24]:
pd.Series(error_ae).plot()
plt.hlines(AE_CL,0,len(error_ae),colors='red',linestyles='--')
Out[24]:
<matplotlib.collections.LineCollection at 0x1c455024850>

Calculating the Reconstruction error using Q-test, T22-test & Cosine similarity.
Considering 99 percentile of error as an acceptable range, and it signifies the steady state operation.

In [25]:
X_reconstructed = np.dot(X_pca,pca.components_)
error_pca = X-X_reconstructed
Q_train = np.sum(np.abs(error_pca),axis=1)
Q_CL = np.percentile(Q_train,99)
# Q_train plot with CL
plt.figure()
plt.plot(Q_train, color='black')
plt.plot([1,len(Q_train)],[Q_CL,Q_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Q metric: training data')
plt.title(f'Q metrix is max: {Q_train.max()} at:{Q_train.argmax()}mins')
plt.show()
In [26]:
lambda_ = np.diag(pca.explained_variance_)
lambda_inv = np.linalg.inv(lambda_)
T_train = np.zeros(X_pca.shape[0])
for i in range(X_pca.shape[0]):
    T_train[i] = np.dot(np.dot(X_pca[i],lambda_inv),X_pca[i].T)
T_CL = np.percentile(T_train,99)
# T2_train plot with CL
plt.figure()
plt.plot(T_train, color='black')
plt.plot([1,len(T_train)],[T_CL,T_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('T$^2$ metric: training data')
plt.title(f'T$^2$ metrix is max: {np.array(T_train).max()} at:{np.array(T_train).argmax()}mins')
plt.show()
In [27]:
cosine = []
ed = []
for i in range(X.shape[0]):
    v1 = X[i]
    v2 = np.dot(X_pca,pca.components_)[i]
    cosine.append(np.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2)))
    ed.append(np.linalg.norm(v1 - v2))
C_CL = np.min(cosine)
E_CL = np.percentile(ed,99)
# pd.Series(ed).plot(color='black')
# plt.plot([1,len(ed)],[E_CL,E_CL], linestyle='--',color='red', linewidth=2)
# plt.show()
pd.Series(cosine).plot(color='black')
plt.plot([1,len(cosine)],[C_CL,C_CL], linestyle='--',color='red', linewidth=2)
plt.xlabel('Sample #')
plt.ylabel('Cosine similarity metric: training data')
plt.title(f'Cosine Similarity')
plt.show()
In [28]:
Q_CL,T_CL,C_CL,E_CL,AE_CL
Out[28]:
(4.123113215519066,
 20.4243503525688,
 0.9427550112367161,
 0.9281694746755804,
 0.4919710296664876)

Let's create a function for test data preprocessing and testing the data with our model.

In [29]:
def Q_test(X,X_pca,pca_components_,Q_CL):
    X_reconstructed = np.dot(X_pca,pca_components_)
    error_pca = X-X_reconstructed
    Q_train = np.sum(np.abs(error_pca),axis=1)
    # Q_train plot with CL
    plt.figure()
    plt.plot(Q_train, color='black')
    plt.plot([1,len(Q_train)],[Q_CL,Q_CL], linestyle='--',color='red', linewidth=2)
    plt.xlabel('Sample #')
    plt.ylabel('Q metric: training data')
    plt.title(f'Q metrix is max: {Q_train.max()} at:{Q_train.argmax()}mins')
    plt.show()
    return error_pca
In [30]:
def T_test(X_pca,explained_variance_,TCL):
    lambda_ = np.diag(pca.explained_variance_)
    lambda_inv = np.linalg.inv(lambda_)
    T_train = np.zeros(X_pca.shape[0])
    for i in range(X_pca.shape[0]):
        T_train[i] = np.dot(np.dot(X_pca[i],lambda_inv),X_pca[i].T)
    # T2_train plot with CL
    plt.figure()
    plt.plot(T_train, color='black')
    plt.plot([1,len(T_train)],[T_CL,T_CL], linestyle='--',color='red', linewidth=2)
    plt.xlabel('Sample #')
    plt.ylabel('T$^2$ metric: training data')
    plt.title(f'T$^2$ metrix is max: {np.array(T_train).max()} at:{np.array(T_train).argmax()}mins')
    plt.show()
In [31]:
def cosine(X,X_transformed,pca_components_,C_CL,E_CL):
    cosine = []
    ed = []
    for i in range(X.shape[0]):
        v1 = X[i]
        v2 = np.dot(X_transformed,pca_components_)[i]
        cosine.append(np.dot(v1,v2)/(np.linalg.norm(v1)*np.linalg.norm(v2)))
        ed.append(np.linalg.norm(v1 - v2))
#     pd.Series(ed).plot(color='black')
#     plt.plot([1,len(ed)],[E_CL,E_CL], linestyle='--',color='red', linewidth=2)
#     plt.xlabel('Sample #')
#     plt.ylabel('Eucledian Distance metric: training data')
#     plt.show()
    pd.Series(cosine).plot(color='black')
    plt.plot([1,len(cosine)],[C_CL,C_CL], linestyle='--',color='red', linewidth=2)
    plt.xlabel('Sample #')
    plt.ylabel('Cosine similarity metric: training data')
    plt.title(f'Cosine Similarity')
    plt.show()
In [32]:
def autoencoder(df_test,CL):
    X_test = ss.transform(df_test)
    error_ae = []
    error_sum = []
    for i in range(X_test.shape[0]):
        y_pred = model.predict(X_test[i].reshape(1,46,1),verbose=None)[0,:,0]
        error_ae.append(np.abs(X_test[i]-y_pred))
        error_sum.append(np.abs(X_test[i]-y_pred).sum())
    error_ae=np.array(error_ae)
    pd.Series(error_sum).plot(color = 'black')
    plt.hlines(CL,0,len(error_ae),colors='red',linestyles='--')
    plt.xlabel('Sample #')
    plt.ylabel('Reconstruction error by Autoencoder')
    return error_ae

Testing the model on Varying feed flow rate.¶

In [33]:
X = ss.transform(df_varyingFeedFlow)
X_test = pca.transform(X)
In [34]:
error_pca = Q_test(X,X_test,pca.components_,Q_CL)
In [35]:
T_test(X_test,pca.explained_variance_,T_CL)
In [36]:
cosine(X,X_test,pca.components_,C_CL,E_CL)
In [37]:
error_ae = autoencoder(df_varyingFeedFlow,AE_CL)

Testing the model on abnormal dataset.¶

In [38]:
X = ss.transform(df_condEff_decrease)
X_test = pca.transform(X)
In [39]:
error_pca = Q_test(X,X_test,pca.components_,Q_CL)
In [40]:
T_test(X_test,pca.explained_variance_,T_CL)
In [41]:
cosine(X,X_test,pca.components_,C_CL,E_CL)
In [42]:
error_ae = autoencoder(df_condEff_decrease,AE_CL)

Inference

During steady state operation the errors are within limit but, suddenly the error starts increasing after 700mins.
So, let’s check which parameters are deviating maximum form steady state.
Considering top 10 variables responsible for plant deviation.

Visualization¶

Q test Error

In [43]:
#%% Q contribution
error = np.abs(error_pca).sum(axis=1)
cum = []
for index,value in enumerate(error):
    if (value>Q_CL) and (len(cum)<15):
        previous_val = value
        cum.append(value)
        if len(cum) == 15:
            sample = index
            break
    else:
        cum=[]
# sample = ((pd.Series(error_pca.sum(axis=1))-pd.Series(error_pca.sum(axis=1)).shift()).abs()).argmax()
print('Time-',sample,'mins')
error_test_sample = error_pca[sample]
Q_contri = np.abs(error_test_sample) # *error_test_sample # vector of contributions
Time- 219 mins
In [44]:
plt.figure(figsize=[15,4])
plt.bar(['variable ' + str((i+1)) for i in range(len(Q_contri))], Q_contri)
plt.xticks(rotation = 80)
plt.ylabel('Q contributions')
plt.show()
In [46]:
plt.figure(figsize=(15,40))
print('Time-',sample,'mins')
for i,n in enumerate(np.argsort(Q_contri)[:-11:-1]):
    plt.subplot(5,2,i+1)
    plt.plot(df_condEff_decrease.iloc[:,n],'blue', linewidth=1)
    plt.xlabel('time (mins)')
    plt.ylabel(df['Symbol'][n])
    plt.title(df['Description'][n])
    plt.show
Time- 219 mins

Autoecoder Error

In [47]:
#%% Autoencoder Error
error = np.abs(error_ae).sum(axis=1)
cum = []
for index,value in enumerate(error):
    if (value>AE_CL) and (len(cum)<15):
        previous_val = value
        cum.append(value)
        if len(cum) == 15:
            sample = index
            break
    else:
        cum=[]
# sample = ((pd.Series(error_ae.sum(axis=1))-pd.Series(error_ae.sum(axis=1)).shift()).abs()).argmax()
print('Time-',sample,'mins')
error_test_sample = error_pca[sample]
Q_contri = np.abs(error_test_sample) # *error_test_sample # vector of contributions
Time- 481 mins
In [48]:
plt.figure(figsize=(15,45))
print('Time-',sample,'mins')
for i,n in enumerate(np.argsort(error_ae[sample])[:-11:-1]):
    plt.subplot(5,2,i+1)
    plt.plot(df_condEff_decrease.iloc[:,n],'blue', linewidth=1)
    plt.xlabel('time (mins)')
    plt.ylabel(df['Symbol'][n])
    plt.title(df['Description'][n])
    plt.show
Time- 481 mins

Conclusion¶

It is clearly visible from the plot that LPG flow rate is continuously increasing and LN flow rate is decreasing. Consequently, the fractionator temperature & pressure CV is also opening & WGC flow, reflux flow, receiver LV starts closing.
Form this we can conclude that the condensation in MC O/H condensers is probably reduced which results in an increase in LPG flow rate(vapour) and decrease in LN flow rate(liquid).
This increase vapour flow rate increases the pressure of fractionator and load of WGC.

Future Work¶

  1. Integrate with IoT sensors for real-time anomaly tracking.
  2. Develop a predictive maintenance dashboard using Power BI.